An Approach to Identify the Number of Clusters
نویسندگان
چکیده
In this technological age, vast amounts of data are generated. Various statistical methods are used to find patterns in data, including clustering. Many common methods for cluster analysis, such as k-means and Nonnegative Matrix Factorization, require input of the number of clusters in the data. However, usually that number is unknown. There exists a method that uses eigenvalues to compute the number of clusters, but sometimes it underestimates that number. In this paper, we propose a complementary method to identify the number of clusters. This method is used to analyze three data sets and gives fairly accurate estimates of the number of clusters.
منابع مشابه
A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملIdentification of Power Stripping Resources with Fuzzy Cluster Dynamic Approach (Case Study: West Azerbaijan Province)
Reducing electric power theft is a significant part of the potential benefits of implementing the concept of smart grid. This paper proposes a data-based approach to identify locations with unusual electricity consumption. The new distance-based method classifies the new data as violator costumers, if their distance is long to the primary consumption data. The proposed algorithm determines the ...
متن کاملImprovement of density-based clustering algorithm using modifying the density definitions and input parameter
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...
متن کاملA New Hybrid Routing Algorithm based on Genetic Algorithm and Simulated Annealing for Vehicular Ad hoc Networks
In recent years, Vehicular Ad-hoc Networks (VANET) as an emerging technology have tried to reduce road damage and car accidents through intelligent traffic controlling. In these networks, the rapid movement of vehicles, topology dynamics, and the limitations of network resources engender critical challenges in the routing process. Therefore, providing a stable and reliable routing algorithm is ...
متن کاملOptimum Ensemble Classification for Fully Polarimetric SAR Data Using Global-Local Classification Approach
In this paper, a proposed ensemble classification for fully polarimetric synthetic aperture radar (PolSAR) data using a global-local classification approach is presented. In the first step, to perform the global classification, the training feature space is divided into a specified number of clusters. In the next step to carry out the local classification over each of these clusters, which cont...
متن کامل